# Public transit ridership refers to the number of people who use public transportation services, such as buses, trains, subways, trams, and other forms of mass transit, to travel from one place to another. It is an important metric to assess the usage and popularity of public transportation systems in a given area. Understanding public transit ridership is crucial for transportation authorities, city planners, and policymakers because it can help inform decisions about funding, service improvements, and the overall effectiveness of a public transit system. 

library(readr)
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2   3.5.1      ✔ fma       2.5   
## ✔ forecast  8.24.0     ✔ expsmooth 2.3
## Warning: package 'forecast' was built under R version 4.4.1
## 
library(TTR)

TRANSIT <- read_csv("TRANSIT.csv")
## Rows: 283 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): TRANSIT
## date (1): DATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Transit_Raw <- TRANSIT$TRANSIT
Transit_ts <- ts(Transit_Raw, frequency = 12, start = c(2000,1))
plot(Transit_ts)

# Plot and Inference
# Time Series Plot
plot(Transit_ts,
     main = paste("Time Series Plot for Usage of Public Transportation Services"),
     xlab = "Year",
     ylab = "Public Transportation Service Usage",
     col = "black",
     lwd = 2)

# Observation: The number of people using public transportation stayed steady oscillating around 800,000, however in 2020 the number of people using public transport suddenly declined. This is due to the onset of COVID-19 and many people either losing their jobs or switching to work from home, which led to the decline.

# Central Tendency
summary(Transit_ts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  171450  772502  824766  779435  866164  993437
boxplot(Transit_ts,
        main = "Boxplot for the Usage of Public Transportation Services",
        ylab = "Value")

# Observation: Public transportation usage was stable and seasonally yet mildly fluctuating until 2020. There was a dramatic decline starting around 2020 due to COVID-19 pandemic and reduced mobility, which led to the huge number of lower outliers observed in the boxplot during the 2020-2021. This distribution would be negatively skewed (left-skewed) because of the sharp dip values. Post-COVID recovery after 2021 has been gradually increasing but overall it is still staying before the pre-2020 levels, this can be due to the permanent change in the work culture and most people working on hybrid schedules. The median for the boxplot is ~824,766 users indicating the central typical level of public transportation usage and the IQR between about 772,502 (Q1) and 866,164 (Q3), showing moderate variability around the median.

# Decomposition
stl_decomp <- stl(Transit_ts,s.window ="periodic")
plot(stl_decomp)

# The time series is seasonal because it does have seasonal fluctuations displayed by the regular, repeating oscillations. The public transportation usage follows a consistent cyclical pattern over time with similar rises and falls occurring at regular intervals of the seasonal component. The amplitude of the seasonal fluctuations is also relatively stable, even though the overall trend drops sharply around 2020.

decomp_transit <- decompose(Transit_ts) 
decomp_transit
## $x
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 2000 724934 756536 843730 757458 818246 793911 743746 787064 802516 851255
## 2001 800293 761213 846702 803744 850680 807301 785014 809941 768513 871470
## 2002 794010 756102 820151 824757 838274 772040 782849 781609 805016 877109
## 2003 778679 732135 820157 810703 799984 754298 767411 745756 807125 830771
## 2004 740829 750265 843276 803337 781133 784558 754430 772963 814446 845387
## 2005 756467 750973 842424 816425 806149 795101 753939 808031 858809 858662
## 2006 793138 761980 874469 806944 870445 827752 790273 840294 850328 907084
## 2007 838203 796569 934752 884098 993437 876764 860196 908061 903028 946754
## 2008 852130 835901 889940 908295 912568 884081 910651 890570 929094 971677
## 2009 827360 806287 897860 874039 856958 848878 843229 825429 875511 917949
## 2010 788830 746114 893899 869986 853057 842674 819152 844979 864777 888591
## 2011 793606 780318 907771 856803 880563 861412 824766 843294 891667 919419
## 2012 843287 859851 913814 869911 900348 851686 848905 893636 877531 924987
## 2013 864280 813159 888516 907515 908342 851490 868345 888314 900754 972814
## 2014 832033 812275 913232 920884 923901 874651 886186 887484 931357 986733
## 2015 818017 779983 907646 901828 871727 878666 890344 840937 897493 953141
## 2016 796025 832503 907384 871083 879877 867550 809114 864510 884140 889002
## 2017 802693 777821 886230 834677 875749 842737 790814 837928 844867 910849
## 2018 788672 766031 841125 829115 857964 822283 797080 833069 816289 921343
## 2019 773669 740735 832480 853038 861455 800261 822014 835988 846274 915445
## 2020 815015 788599 505660 171450 200586 256196 306701 318609 331020 351338
## 2021 292833 275803 348362 352403 374081 409969 423379 428647 463998 500798
## 2022 392741 427514 512795 507034 517434 522856 495785 533257 563825 583121
## 2023 525669 512739 593360 560127 609180 572834 532206                     
##         Nov    Dec
## 2000 816120 755040
## 2001 820117 756221
## 2002 788746 754257
## 2003 724971 758869
## 2004 797405 767658
## 2005 825130 760996
## 2006 851564 804029
## 2007 870712 808452
## 2008 839026 835899
## 2009 830535 815176
## 2010 837848 812769
## 2011 862730 839221
## 2012 863060 824694
## 2013 862788 837701
## 2014 841871 864282
## 2015 848427 850726
## 2016 839892 807517
## 2017 831770 773762
## 2018 812423 763255
## 2019 814066 791316
## 2020 311158 309585
## 2021 469267 450587
## 2022 549008 520241
## 2023              
## 
## $seasonal
##              Jan         Feb         Mar         Apr         May         Jun
## 2000 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2001 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2002 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2003 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2004 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2005 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2006 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2007 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2008 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2009 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2010 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2011 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2012 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2013 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2014 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2015 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2016 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2017 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2018 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2019 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2020 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2021 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2022 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
## 2023 -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
##              Jul         Aug         Sep         Oct         Nov         Dec
## 2000 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2001 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2002 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2003 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2004 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2005 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2006 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2007 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2008 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2009 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2010 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2011 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2012 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2013 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2014 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2015 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2016 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2017 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2018 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2019 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2020 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2021 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2022 -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 2023 -20385.6561                                                            
## 
## $trend
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2000       NA       NA       NA       NA       NA       NA 790686.3 794021.1
## 2001 803859.0 806531.7 806068.1 805493.6 806502.5 806718.2 806505.6 806030.9
## 2002 801294.0 800023.2 800363.7 802119.6 801047.5 799658.5 798937.9 797300.5
## 2003 789818.6 787681.5 786275.5 784432.6 779844.5 777379.4 775994.5 775172.8
## 2004 777650.9 778243.6 779682.3 780596.3 784223.4 787607.7 788625.5 789306.6
## 2005 793298.5 794739.2 798048.9 800450.5 802158.8 803036.4 804286.8 806273.4
## 2006 818205.2 821063.4 822054.3 823718.5 826837.5 829732.0 833402.7 836721.6
## 2007 866863.0 872600.1 877619.6 881468.3 883919.1 884901.2 885665.8 887884.9
## 2008 883778.8 885152.3 885509.6 887634.1 887352.3 887175.7 887287.2 885021.2
## 2009 871215.7 865692.2 860745.4 856274.1 853681.6 852464.4 849995.5 845882.9
## 2010 840862.5 840673.9 841041.2 839370.8 838452.2 838656.6 838755.3 840379.5
## 2011 845949.7 846113.4 847163.6 849568.5 851889.8 854028.7 857200.9 862584.8
## 2012 869338.6 872442.0 873950.6 873593.6 873839.3 873247.8 873517.2 872446.4
## 2013 872986.2 873574.5 874320.4 877280.8 879262.2 879792.9 878991.2 877610.8
## 2014 884717.7 885426.5 886667.0 888522.1 888230.5 888466.5 888990.1 887060.6
## 2015 879821.6 878055.4 874704.9 871894.2 870767.8 870476.1 868994.9 870266.9
## 2016 866239.6 863837.2 864263.0 861034.2 858006.1 855850.1 854327.6 852327.0
## 2017 842077.7 840207.6 837463.6 836737.5 837309.4 835564.5 833573.9 832498.4
## 2018 824859.4 824918.0 823524.8 822771.3 822402.5 821158.5 820095.6 818416.5
## 2019 818130.3 819290.9 820661.9 821665.5 821488.2 822725.9 825617.8 829334.9
## 2020 625412.7 582383.9 539357.5 494384.1 449925.2 408898.5 367068.8 323944.8
## 2021 336683.4 346129.9 356255.6 368023.8 380839.2 393302.2 403340.1 413824.2
## 2022 471104.4 478480.1 486998.3 494587.9 501340.5 507565.3 516006.2 525096.0
## 2023 553113.0       NA       NA       NA       NA       NA       NA         
##           Sep      Oct      Nov      Dec
## 2000 794339.8 796392.2 799672.2 801581.6
## 2001 804711.6 804480.9 804839.5 802853.4
## 2002 796302.1 795716.7 793535.7 791201.1
## 2003 776891.5 777547.9 776455.5 776930.9
## 2004 789300.6 789810.4 791398.1 792879.7
## 2005 808067.2 809007.4 811291.3 815330.8
## 2006 840674.6 846401.2 854740.6 861907.4
## 2007 887656.6 886797.6 884436.3 881371.6
## 2008 884117.3 883020.0 879275.6 875491.7
## 2009 843210.6 842876.7 842545.3 842124.2
## 2010 842382.7 842411.4 843008.2 844935.0
## 2011 866150.5 866948.4 868319.0 868738.1
## 2012 869446.8 869959.6 871859.5 872184.4
## 2013 878603.7 880190.6 881396.0 883009.3
## 2014 885482.3 884455.6 881487.7 879481.0
## 2015 872444.3 871152.4 870210.9 870087.3
## 2016 849167.2 846768.8 845079.9 843874.0
## 2017 830127.8 828016.7 827043.9 825450.6
## 2018 817002.3 817638.9 818781.1 818009.0
## 2019 817711.7 775694.7 719759.0 669553.5
## 2020 296024.2 297009.8 311778.5 325414.6
## 2021 426996.9 440291.2 452707.2 463383.9
## 2022 532003.9 537573.0 543607.9 549513.1
## 2023                                    
## 
## $random
##                Jan           Feb           Mar           Apr           May
## 2000            NA            NA            NA            NA            NA
## 2001   23569.96222     438.48833    9253.12848      26.82355   29638.31219
## 2002   19852.00389    1835.94666  -11593.45485   24413.82355   22687.31219
## 2003   15996.37889   -9789.26167    2500.79515   28046.86522    5600.22886
## 2004   -9685.91278   17778.57166   32212.96181   24517.11522  -17629.64614
## 2005   -9695.57944    1990.94666   12994.37848   17750.99022  -10549.02114
## 2006    2068.79556  -13326.17834   21033.96181  -14998.05145   29068.27052
## 2007   -1524.07944  -30273.92834   25751.67015    4406.11522   94978.68719
## 2008   -4512.82944   -3494.09501  -26950.32985   22437.32355   10676.43719
## 2009  -16719.70444  -13648.01167    5733.87848   19541.36522  -11262.85448
## 2010  -24896.57944  -48802.72001   21477.00348   32391.69855      65.56219
## 2011  -25207.70444  -20038.17834   29226.67015    9010.94855   14134.02052
## 2012    1084.33722   33166.19666    8482.67015   -1906.13478   11969.43719
## 2013   18429.71222  -14658.30334  -17185.12152   32010.65689   14540.52052
## 2014  -25548.74611  -27394.30334   -4815.78819   34138.32355   21131.22886
## 2015  -34668.62111  -52315.17834    1560.33681   31710.19855  -13579.97948
## 2016  -43078.62111   14422.98833   11740.21181   11825.24022    7331.64552
## 2017  -12248.70444  -16629.38667   17385.62848    -284.09311   23900.35386
## 2018   -9051.45444  -13129.84501  -13780.57985    8120.11522   21022.31219
## 2019  -17325.37111  -32798.67834  -19562.62152   33148.94855   25427.56219
## 2020  216738.25389  251972.32166  -65078.24652 -321157.67645 -263878.39614
## 2021  -16714.45444  -24569.72001  -39274.32985  -13844.38478  -21297.43781
## 2022  -51227.45444   -5208.88667   -5584.03819   14222.57355    1554.22886
## 2023    -308.07944            NA            NA            NA            NA
##                Jun           Jul           Aug           Sep           Oct
## 2000            NA  -26554.63560   -6767.46531  -14616.70807  -15146.43814
## 2001   11525.15499   -1105.96894    4099.78469  -58991.49973   -3020.06314
## 2002  -16676.13667    4296.78106  -15501.79865  -14078.95807   11383.06186
## 2003  -12139.05334   11802.15606  -29227.17365    7440.58360  -16786.10480
## 2004    7892.65499  -13809.84394  -16153.92365    2352.54193  -14432.60480
## 2005    3006.94666  -29962.13560    1947.28469   27948.91693  -20354.56314
## 2006    8962.40499  -22744.05227    3762.03469  -13139.49973   -9326.35480
## 2007    2805.15499   -5084.13560   20365.74302   -7421.45807  -10052.81314
## 2008    7847.65499   43749.40606    5738.40969   22183.79193   18647.81186
## 2009    7355.98833   13619.15606  -20264.21531    9507.50027    5063.10353
## 2010   14959.73833     782.32273    4789.15969    -398.54140  -23829.56314
## 2011   18325.69666  -12049.21894  -19101.13198    2723.66693  -17538.60480
## 2012  -10619.42834   -4226.55227   21379.24302  -14708.70807  -14981.77147
## 2013  -17360.51167    9739.44773   10892.90969    -642.62473   22614.18686
## 2014   -2873.17834   17581.57273     613.07635   23081.79193   32268.22853
## 2015   19132.27999   41734.73940  -29140.25698    2255.79193   11979.43686
## 2016   22642.23833  -24827.92727   12372.65969   12179.95860  -27776.02147
## 2017   18114.82166  -22374.21894    5619.24302   -8053.66640   12823.14520
## 2018   12066.82166   -2629.96894   14842.15969  -23506.16640   33694.93686
## 2019  -11522.51167   16781.82273    6842.74302    5769.37527   69741.06186
## 2020 -141760.17834  -39982.17727   -5146.09031   12202.95860  -15680.97980
## 2021   27609.19666   40424.57273   15012.45135   14208.25027   -9502.39647
## 2022   26233.02999     164.40606    8350.70135    9028.25027  -24461.14647
## 2023            NA            NA                                          
##                Nov           Dec
## 2000   19052.98759  -16612.06857
## 2001   17882.73759  -16702.86024
## 2002   -2184.51241   -7014.56857
## 2003  -48879.30408   11867.59809
## 2004    8612.15425    4707.80643
## 2005   16443.90425  -24405.27691
## 2006    -571.34575  -27948.90191
## 2007  -11119.05408  -42990.11024
## 2008  -37644.34575   -9663.19357
## 2009   -9405.05408    2981.26476
## 2010   -2554.92908   -2236.48524
## 2011   -2983.72075     412.43143
## 2012   -6194.26241  -17560.90191
## 2013  -16002.72075  -15378.77691
## 2014  -37011.42908   14730.47309
## 2015  -19178.67908   10568.18143
## 2016   -2582.67908   -6427.52691
## 2017    7331.36259  -21759.06857
## 2018   -3752.88741  -24824.48524
## 2019   96912.19592  151692.05643
## 2020    1984.77925   14099.88976
## 2021   19165.02925   17132.63976
## 2022    8005.32092     657.43143
## 2023                            
## 
## $figure
##  [1] -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
##  [7] -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
## 
## $type
## [1] "additive"
## 
## attr(,"class")
## [1] "decomposed.ts"
# The decomposition is additive.

monthly_indices <- decomp_transit$seasonal[1:12]
monthly_indices
##  [1] -27135.9622 -45757.1967  31380.7465  -1776.4486  14539.2295 -10942.3633
##  [7] -20385.6561   -189.6597  22792.8747  70009.1881  -2605.2376 -29929.5148
# February has the lowest monthly index (-45757.1967) which could possibly be the winter effect and October has the highest monthly index (70009.1881) which could be due to back-to school/work pattern changes. 

# February was lower due to mid-winter effects discourage travel, it is also a shorter month and therefore fewer commuting days. In the recent years, there is more flexibility for remote work arrangements in the colder months reducing the transit usage. Additionally, in this month people are less likely to go out for social or recreational activities.

# October has higher because both work and school are in high peak leading to higher usage of transit. It is also a full work/school month having most number of commuting days compared to the holiday winter months. It also has milder weather, making it favorable and more attractive to use public transportation for fall events, sports, and social activities which would increase mobility and higher usage of the public transportation services.

seasadj <- seasadj(stl_decomp)
seasadj
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2000 753278.5 801529.0 811357.6 760753.0 802351.7 804104.2 765494.7 787285.0
## 2001 828637.5 806206.0 814329.6 807039.0 834785.7 817494.2 806762.7 810162.0
## 2002 822354.5 801095.0 787778.6 828052.0 822379.7 782233.2 804597.7 781830.0
## 2003 807023.5 777128.0 787784.6 813998.0 784089.7 764491.2 789159.7 745977.0
## 2004 769173.5 795258.0 810903.6 806632.0 765238.7 794751.2 776178.7 773184.0
## 2005 784811.5 795966.0 810051.6 819720.0 790254.7 805294.2 775687.7 808252.0
## 2006 821482.5 806973.0 842096.6 810239.0 854550.7 837945.2 812021.7 840515.0
## 2007 866547.5 841562.0 902379.6 887393.0 977542.7 886957.2 881944.7 908282.0
## 2008 880474.5 880894.0 857567.6 911590.0 896673.7 894274.2 932399.7 890791.0
## 2009 855704.5 851280.0 865487.6 877334.0 841063.7 859071.2 864977.7 825650.0
## 2010 817174.5 791107.0 861526.6 873281.0 837162.7 852867.2 840900.7 845200.0
## 2011 821950.5 825311.0 875398.6 860098.0 864668.7 871605.2 846514.7 843515.0
## 2012 871631.5 904844.0 881441.6 873206.0 884453.7 861879.2 870653.7 893857.0
## 2013 892624.5 858152.0 856143.6 910810.0 892447.7 861683.2 890093.7 888535.0
## 2014 860377.5 857268.0 880859.6 924179.0 908006.7 884844.2 907934.7 887705.0
## 2015 846361.5 824976.0 875273.6 905123.0 855832.7 888859.2 912092.7 841158.0
## 2016 824369.5 877496.0 875011.6 874378.0 863982.7 877743.2 830862.7 864731.0
## 2017 831037.5 822814.0 853857.6 837972.0 859854.7 852930.2 812562.7 838149.0
## 2018 817016.5 811024.0 808752.6 832410.0 842069.7 832476.2 818828.7 833290.0
## 2019 802013.5 785728.0 800107.6 856333.0 845560.7 810454.2 843762.7 836209.0
## 2020 843359.5 833592.0 473287.6 174745.0 184691.7 266389.2 328449.7 318830.0
## 2021 321177.5 320796.0 315989.6 355698.0 358186.7 420162.2 445127.7 428868.0
## 2022 421085.5 472507.0 480422.6 510329.0 501539.7 533049.2 517533.7 533478.0
## 2023 554013.5 557732.0 560987.6 563422.0 593285.7 583027.2 553954.7         
##           Sep      Oct      Nov      Dec
## 2000 779732.2 781205.4 818560.6 784904.2
## 2001 745729.2 801420.4 822557.6 786085.2
## 2002 782232.2 807059.4 791186.6 784121.2
## 2003 784341.2 760721.4 727411.6 788733.2
## 2004 791662.2 775337.4 799845.6 797522.2
## 2005 836025.2 788612.4 827570.6 790860.2
## 2006 827544.2 837034.4 854004.6 833893.2
## 2007 880244.2 876704.4 873152.6 838316.2
## 2008 906310.2 901627.4 841466.6 865763.2
## 2009 852727.2 847899.4 832975.6 845040.2
## 2010 841993.2 818541.4 840288.6 842633.2
## 2011 868883.2 849369.4 865170.6 869085.2
## 2012 854747.2 854937.4 865500.6 854558.2
## 2013 877970.2 902764.4 865228.6 867565.2
## 2014 908573.2 916683.4 844311.6 894146.2
## 2015 874709.2 883091.4 850867.6 880590.2
## 2016 861356.2 818952.4 842332.6 837381.2
## 2017 822083.2 840799.4 834210.6 803626.2
## 2018 793505.2 851293.4 814863.6 793119.2
## 2019 823490.2 845395.4 816506.6 821180.2
## 2020 308236.2 281288.4 313598.6 339449.2
## 2021 441214.2 430748.4 471707.6 480451.2
## 2022 541041.2 513071.4 551448.6 550105.2
## 2023
plot(seasadj)
lines(Transit_ts, col="Red")

# Seasonality does not have big fluctuations in the values of the time series because it is stable so even without seasonality, the graph follows the original time series plot.

# Naive
naive_model <- naive(Transit_ts)
plot(naive_model)

checkresiduals(naive_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 281.61, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24
naive_residuals <- residuals(naive_model)
naive_residuals
##          Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep
## 2000      NA   31602   87194  -86272   60788  -24335  -50165   43318   15452
## 2001   45253  -39080   85489  -42958   46936  -43379  -22287   24927  -41428
## 2002   37789  -37908   64049    4606   13517  -66234   10809   -1240   23407
## 2003   24422  -46544   88022   -9454  -10719  -45686   13113  -21655   61369
## 2004  -18040    9436   93011  -39939  -22204    3425  -30128   18533   41483
## 2005  -11191   -5494   91451  -25999  -10276  -11048  -41162   54092   50778
## 2006   32142  -31158  112489  -67525   63501  -42693  -37479   50021   10034
## 2007   34174  -41634  138183  -50654  109339 -116673  -16568   47865   -5033
## 2008   43678  -16229   54039   18355    4273  -28487   26570  -20081   38524
## 2009   -8539  -21073   91573  -23821  -17081   -8080   -5649  -17800   50082
## 2010  -26346  -42716  147785  -23913  -16929  -10383  -23522   25827   19798
## 2011  -19163  -13288  127453  -50968   23760  -19151  -36646   18528   48373
## 2012    4066   16564   53963  -43903   30437  -48662   -2781   44731  -16105
## 2013   39586  -51121   75357   18999     827  -56852   16855   19969   12440
## 2014   -5668  -19758  100957    7652    3017  -49250   11535    1298   43873
## 2015  -46265  -38034  127663   -5818  -30101    6939   11678  -49407   56556
## 2016  -54701   36478   74881  -36301    8794  -12327  -58436   55396   19630
## 2017   -4824  -24872  108409  -51553   41072  -33012  -51923   47114    6939
## 2018   14910  -22641   75094  -12010   28849  -35681  -25203   35989  -16780
## 2019   10414  -32934   91745   20558    8417  -61194   21753   13974   10286
## 2020   23699  -26416 -282939 -334210   29136   55610   50505   11908   12411
## 2021  -16752  -17030   72559    4041   21678   35888   13410    5268   35351
## 2022  -57846   34773   85281   -5761   10400    5422  -27071   37472   30568
## 2023    5428  -12930   80621  -33233   49053  -36346  -40628                
##          Oct     Nov     Dec
## 2000   48739  -35135  -61080
## 2001  102957  -51353  -63896
## 2002   72093  -88363  -34489
## 2003   23646 -105800   33898
## 2004   30941  -47982  -29747
## 2005    -147  -33532  -64134
## 2006   56756  -55520  -47535
## 2007   43726  -76042  -62260
## 2008   42583 -132651   -3127
## 2009   42438  -87414  -15359
## 2010   23814  -50743  -25079
## 2011   27752  -56689  -23509
## 2012   47456  -61927  -38366
## 2013   72060 -110026  -25087
## 2014   55376 -144862   22411
## 2015   55648 -104714    2299
## 2016    4862  -49110  -32375
## 2017   65982  -79079  -58008
## 2018  105054 -108920  -49168
## 2019   69171 -101379  -22750
## 2020   20318  -40180   -1573
## 2021   36800  -31531  -18680
## 2022   19296  -34113  -28767
## 2023
plot(naive_residuals,
     main = "Residuals from Naive Forecast for the Usage of Public Transportation Services",
     ylab = "Residuals", xlab = "Time",
     col = "blue", type = "l")
abline(h = 0, col = "red")

# The plot indicates that the model captured the main pattern of the public transportation usage, because the pattern oscillates over 800,000 transit users except for the sharp drop around 2020 which is a huge negative deviation from the norm.

hist(naive_residuals,
     main = "Histogram of Residuals",
     xlab = "Residuals",
     col = "lightblue")

# The histogram roughly forms a bell-shaped curve centered around 0, however due to the sudden drop around 2020 and the gradual recovery is still below the pre-2020 levels explaining the lower outliers in the boxplot earlier, resulting in the negatively skewed AKA left skewed distribution of the histogram.

plot(fitted(naive_model), residuals(naive_model),
     main = "Residuals vs Fitted Values",
     xlab = "Fitted Values",
     ylab = "Residuals",
     col = "steelblue", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# The cluster at the higher values suggests that the data points around and above 800,000 is where the model consistently predicts well and captures a stable pattern which indicates smaller variation. The random scatter at lower fitted values could indicate greater variation in those years due to the external sudden shock like the COVID-19 pandemic in 2020 and the post-pandemic recovery years. At the lower values, we observe heteroscedasticity or non-constant variance because the scatterplot is wider and more random at those values.

plot(Transit_ts, residuals(naive_model),
     main = "Residuals vs Actual Values",
     xlab = "Actual Values",
     ylab = "Residuals",
     col = "navy", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# This plot is similar to the fitted vs. residuals plot meaning that the model fits the data well. It indicates higher variation at lower values and smaller variation at higher values during the stable years. The clustering around the higher values indicates there is a model misfit in this case due to heteroscedasticity.

Acf(naive_residuals)

# The bars outside of the blue dashed line which is the approximate 95% confidence interval, indicate significantly autocorrelation at that lag. There are significant spikes at lag 12 and lag 24 which suggests that there are patterns that are left unexplained by the naive model. The spikes at some lags indicate seasonality or external shocks which might be better captured by other models.

accuracy(naive_model)
##                     ME    RMSE      MAE        MPE     MAPE      MASE
## Training set -683.4326 55803.6 41483.85 -0.6755225 6.063339 0.7298063
##                    ACF1
## Training set -0.1325041
naive_forecast <- naive(Transit_ts,12)
naive_forecast
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Aug 2023         532206 460690.8 603721.2 422832.9 641579.1
## Sep 2023         532206 431068.2 633343.8 377529.1 686882.9
## Oct 2023         532206 408338.0 656074.0 342766.3 721645.7
## Nov 2023         532206 389175.6 675236.4 313459.9 750952.1
## Dec 2023         532206 372293.2 692118.8 287640.4 776771.6
## Jan 2024         532206 357030.3 707381.7 264297.8 800114.2
## Feb 2024         532206 342994.6 721417.4 242832.1 821579.9
## Mar 2024         532206 329930.5 734481.5 222852.3 841559.7
## Apr 2024         532206 317660.4 746751.6 204086.8 860325.2
## May 2024         532206 306055.1 758356.9 186338.0 878074.0
## Jun 2024         532206 295016.9 769395.1 169456.6 894955.4
## Jul 2024         532206 284470.1 779941.9 153326.6 911085.4
plot(naive_forecast)

# The MAPE for this type of forecast is 6.063339 which is less than 10%, so this is a good indicator that the naive forecast is overall performing well especially in the pre-COVID-19 time, and the data does not have extremely small values that impact the percentages except for the years of 2020-2021.

# In 1 year, the time series value for the usage of public transportation services will be 532206.

# The forecast is taking the last value and predicting the values for the next whole year for the point level forecast.

# Simple Moving Averages
plot(Transit_ts,
     main = paste("Time Series Plot for the Usage of the Public Transportation Services"),
     xlab = "Year",
     ylab = "Public Transportation Services Usage",
     col = "black")

MA3_forecast <- ma(Transit_ts,order=3)
lines(MA3_forecast, col = "red")

MA6_forecast <- ma(Transit_ts,order=6)
lines(MA6_forecast, col = "blue")

MA9_forecast <- ma(Transit_ts,order=9)
lines(MA9_forecast, col = "green")

# It is observed that the moving average of the order 9 is the best fit that has the best smoothing. So in forecasting the next 12 months, it will be the best order fit to be used.

# Moving Average with the Order = 9
MA9_model <- ma(Transit_ts, order = 9, centre = FALSE)
MA9_f <- forecast(MA9_model, h = 12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
MA9_f
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Apr 2023       548293.2 537051.7 559534.6 531100.9 565485.4
## May 2023       544672.6 521526.8 567818.3 509274.2 580070.9
## Jun 2023       541776.1 505888.4 577663.8 486890.5 596661.6
## Jul 2023       539458.9 490582.2 588335.6 464708.4 614209.4
## Aug 2023       537605.2 475825.9 599384.4 443122.0 632088.4
## Sep 2023       536122.2 461715.7 610528.7 422327.2 649917.1
## Oct 2023       534935.8 448281.0 621590.5 402408.7 667462.8
## Nov 2023       533986.6 435514.7 632458.6 383386.8 684586.5
## Dec 2023       533227.3 423389.4 643065.2 365244.8 701209.9
## Jan 2024       532619.9 411867.7 653372.1 347945.3 717294.5
## Feb 2024       532133.9 400907.4 663360.4 331440.3 732827.5
## Mar 2024       531745.2 390465.9 673024.4 315677.2 747813.1
plot(MA9_f)

accuracy(MA9_f)
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set -195.8352 8691.624 6867.204 0.00766116 0.9437753 0.1484077
##                    ACF1
## Training set 0.06643575
# Based on the moving average of order 9 forecast, in 1 year the public transportation usage will be 531745.

MA3_model <- ma(Transit_ts, order = 3, centre = FALSE)
MA3_f <- forecast(MA3_model, h = 12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
MA3_f
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jul 2023       567792.1 539434.8 596149.3 524423.4 611160.8
## Aug 2023       579627.0 537198.3 622055.6 514737.9 644516.0
## Sep 2023       610628.4 556179.1 665077.7 527355.4 693901.4
## Oct 2023       610519.2 545147.1 675891.3 510541.2 710497.2
## Nov 2023       593501.9 517978.7 669025.2 477999.1 709004.8
## Dec 2023       561566.1 476507.0 646625.3 431479.4 691652.9
## Jan 2024       547527.0 453454.2 641599.8 403655.1 691398.9
## Feb 2024       565911.6 463283.3 668539.8 408955.2 722867.9
## Mar 2024       576935.3 466160.5 687710.1 407519.9 746350.8
## Apr 2024       598185.3 479632.9 716737.8 416875.0 779495.7
## May 2024       584492.6 458497.1 710488.1 391799.1 777186.1
## Jun 2024       577737.2 444603.7 710870.7 374127.1 781347.3
plot(MA3_f)

accuracy(MA3_f)
##                     ME     RMSE      MAE         MPE     MAPE      MASE
## Training set -474.3732 21447.51 9418.979 -0.06588614 1.651357 0.1843854
##                   ACF1
## Training set 0.5855302
MA6_model <- ma(Transit_ts, order = 6, centre = FALSE)
MA6_f <- forecast(MA6_model, h = 12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
MA6_f
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## May 2023       566436.9 551704.6 581169.3 543905.7 588968.1
## Jun 2023       568133.7 541205.3 595062.0 526950.3 609317.1
## Jul 2023       569491.1 528763.0 610219.2 507202.9 631779.3
## Aug 2023       570577.1 515573.9 625580.2 486457.0 654697.1
## Sep 2023       571445.8 502182.1 640709.5 465516.1 677375.5
## Oct 2023       572140.8 488882.8 655398.8 444808.7 699473.0
## Nov 2023       572696.8 475844.2 669549.5 424573.5 720820.2
## Dec 2023       573141.6 463160.8 683122.4 404940.5 741342.8
## Jan 2024       573497.5 450882.2 696112.7 385973.7 761021.3
## Feb 2024       573782.1 439029.5 708534.7 367695.8 779868.4
## Mar 2024       574009.9 427605.8 720413.9 350104.2 797915.5
## Apr 2024       574192.1 416603.0 731781.1 333180.4 815203.8
plot(MA6_f)

accuracy(MA6_f)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -186.0535 11391.86 8072.671 0.03998999 1.175111 0.1671021
##                    ACF1
## Training set 0.02346138
# As the moving average order goes up, the smoothing is stronger. At the 9th order, it produces a smoother trend and less random variation. It has has the best fit and lowest errors. This is also proved by the MAPE as it moves from 1.65 -> 1.18 -> 0.94 respectively. A MAPE below 0 means that it is statistically an extremely close prediction with extremely accurate relative error. This order will be great for long-term forecasting but not for rapid shifts.

# Simple Smoothing
ses_model <- ses(Transit_ts,12) 
plot(ses_model)

summary(ses_model)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
## ses(y = Transit_ts, h = 12)
## 
##   Smoothing parameters:
##     alpha = 0.8374 
## 
##   Initial states:
##     l = 732001.6666 
## 
##   sigma:  55323.68
## 
##      AIC     AICc      BIC 
## 7782.916 7783.002 7793.852 
## 
## Error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -812.0035 55127.84 40313.36 -0.8363806 6.026904 0.7092143
##                    ACF1
## Training set 0.01607982
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Aug 2023       539578.3 468678.2 610478.5 431145.9 648010.7
## Sep 2023       539578.3 447103.9 632052.8 398150.9 681005.8
## Oct 2023       539578.3 429686.5 649470.1 371513.3 707643.4
## Nov 2023       539578.3 414674.8 664481.9 348554.8 730601.9
## Dec 2023       539578.3 401283.0 677873.6 328073.9 751082.8
## Jan 2024       539578.3 389078.2 690078.4 309408.3 769748.4
## Feb 2024       539578.3 377791.5 701365.1 292146.7 787009.9
## Mar 2024       539578.3 367242.4 711914.2 276013.3 803143.4
## Apr 2024       539578.3 357302.8 721853.8 260812.0 818344.6
## May 2024       539578.3 347877.9 731278.7 246397.9 832758.8
## Jun 2024       539578.3 338895.2 740261.5 232659.9 846496.8
## Jul 2024       539578.3 330297.6 748859.1 219511.1 859645.6
# The value of alpha is 0.8374. Since the alpha is closer to 1 indicating that the model reacts quickly to recent changes, therefore the most weight is on the latest observed values which dominate the forecast.

# The initial state value is 732001.6666. 

# The value of sigma is 55323.68. Sigma gives the typical deviation of actual values from fitted values. Based on the initial state, the forecast deviates from the actual values by 7.5% which is considered to be a good forecast. The deviation in this case is considered moderate and therefore it can be used for practical predictions centered around the usage fluctuations of the public transportation services.

checkresiduals(ses_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from Simple exponential smoothing
## Q* = 249.65, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24
ses_residuals <- residuals(ses_model)
ses_residuals
##                Jan           Feb           Mar           Apr           May
## 2000   -7067.66664   30452.53069   92146.73071  -71285.46416   49194.29336
## 2001   34614.16470  -33450.43055   80048.69764  -29939.06089   42066.77886
## 2002   26455.00224  -33605.41828   58583.49079   14133.88642   15815.70332
## 2003   16801.87805  -43811.38056   80896.61617    3702.84266  -10116.77805
## 2004  -15183.12841    6966.65034   94144.04025  -24627.62625  -26209.38142
## 2005  -17134.47941   -8280.71297   90104.24357  -11344.64964  -12121.06816
## 2006   20864.68426  -27764.61495  107973.42583  -49964.44626   55374.89548
## 2007   25230.53170  -37530.56340  132079.11406  -29172.95166  104594.37713
## 2008   31730.21001  -11068.47042   52238.84901   26851.00824    8639.98724
## 2009  -12348.05617  -23081.25992   87819.11610   -9538.29761  -18632.28715
## 2010  -30940.70134  -47748.12568  140019.35285   -1140.56918  -17114.49959
## 2011  -24465.23459  -17266.97041  124644.73899  -30696.06160   18767.66198
## 2012   -1102.31972   16384.72121   56627.77400  -34693.18693   24794.57855
## 2013   31905.99626  -45931.88094   67886.74278   30039.94628    5712.62891
## 2014  -12337.27722  -21764.50686   97417.26984   23495.72441    6838.29147
## 2015  -46182.73894  -45545.05620  120255.65508   13740.10772  -27866.33996
## 2016  -56823.32412   27236.38310   79310.66374  -23402.09305    4987.93652
## 2017  -11348.56932  -26717.70564  104063.69279  -34628.31640   35440.12895
## 2018    3677.01227  -22042.97904   71508.97975    -379.94129   28787.20721
## 2019     -19.88805  -32937.23455   86388.16268   34607.97537   14045.56283
## 2020   17623.99339  -23549.67358 -286769.06564 -380849.47215  -32804.49666
## 2021  -17972.10049  -19952.94177   69313.89868   15314.05569   24168.64338
## 2022  -61534.08818   24765.23509   89308.76182    8763.97500   11825.35307
## 2023     -44.74156  -12937.27667   78516.91059  -20463.18727   45724.91351
##                Jun           Jul           Aug           Sep           Oct
## 2000  -16334.15141  -52821.54944   34727.22268   21099.95694   52170.64927
## 2001  -36537.35424  -28229.35265   20335.84194  -38120.62467   96757.14783
## 2002  -63661.76979     455.19377   -1165.96832   23217.36955   75869.02047
## 2003  -47331.36990    5415.13307  -20774.29501   57990.31565   33077.41376
## 2004    -837.63451  -30264.23098   13610.89393   43696.64500   38047.72350
## 2005  -13019.34311  -43279.43651   47053.13035   58430.61468    9356.02300
## 2006  -33686.95226  -42957.76971   43034.44551   17033.02487   59526.21264
## 2007  -99662.00710  -32776.80340   42534.25484    1884.67499   44032.51928
## 2008  -27081.81202   22165.47535  -16476.05723   35844.37132   48412.64747
## 2009  -11110.31303   -7455.95618  -19012.61985   46989.83052   50080.31974
## 2010  -13166.46350  -25663.36384   21653.16856   23319.62235   27606.65063
## 2011  -16098.66991  -39264.25126   12142.15099   50347.77197   35940.44775
## 2012  -44629.46583  -10039.43537   43098.20894   -9095.60479   45976.71141
## 2013  -55922.91096    7759.82426   21231.04026   15892.96836   74644.79642
## 2014  -48137.83574    3705.97128    1900.73079   44182.13056   62561.68177
## 2015    2406.88150   12069.44976  -47444.05199   48839.80670   63591.19569
## 2016  -11515.77329  -60308.89932   45587.49704   27044.24740    9260.41522
## 2017  -27248.09759  -56354.56896   37948.62038   13110.87778   68114.32350
## 2018  -30999.11374  -30244.62574   31070.08248  -11726.83207  103146.77456
## 2019  -58909.66143   12172.06592   15953.63727   12880.66348   71265.88197
## 2020   50274.75087   58681.57176   21451.83809   15899.87842   22903.92026
## 2021   39818.73349   19886.02874    8502.21873   36733.78162   42774.29915
## 2022    7345.24867  -25876.38537   33263.52322   35977.90417   25147.36495
## 2023  -28909.40350  -45329.75999                                          
##                Nov           Dec
## 2000  -26650.08361  -65414.30932
## 2001  -35616.63634  -69688.60917
## 2002  -76023.83413  -46853.34441
## 2003 -100420.36425   17565.85911
## 2004  -41794.00432  -36544.28232
## 2005  -32010.35758  -69340.09215
## 2006  -45838.79143  -54990.11735
## 2007  -68880.65067  -73462.59321
## 2008 -124777.27653  -23420.49400
## 2009  -79269.05008  -28251.13900
## 2010  -46253.11679  -32601.50229
## 2011  -50843.72688  -31778.10873
## 2012  -54449.45166  -47221.53565
## 2013  -97885.93916  -41006.94772
## 2014 -134687.10959     505.79316
## 2015  -94371.67178  -13049.39523
## 2016  -47603.90702  -40117.19176
## 2017  -68001.04050  -69067.53541
## 2018  -92144.44191  -64154.16360
## 2019  -89788.47816  -37352.99498
## 2020  -36454.95822   -7501.95194
## 2021  -24574.28474  -22676.70608
## 2022  -30023.08945  -33649.88735
## 2023
plot(ses_residuals,
     main = "Residuals from Simple Exponential Smoothing Forecast for the Usage of Public Transportation Services",
     ylab = "Residuals", xlab = "Time",
     col = "blue", type = "l")
abline(h = 0, col = "red")

# The plot indicates that the model captured the main pattern of the public transportation usage, because the pattern oscillates around 800,000 transit users except for the sharp drop around 2020 which is a huge negative deviation from the norm.

hist(ses_residuals,
     main = "Histogram of Residuals",
     xlab = "Residuals",
     col = "seagreen")

# The histogram roughly forms a bell-shaped curve centered around 0, however due to the sudden drop around 2020 and the gradual recovery is still below the pre-2020 levels explaining the lower outliers in the boxplot earlier, resulting in the negatively skewed AKA left skewed distribution of the histogram.

plot(fitted(ses_model), residuals(ses_model),
     main = "Residuals vs Fitted Values",
     xlab = "Fitted Values",
     ylab = "Residuals",
     col = "steelblue", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# The cluster at the higher values suggests that the data points around and above 800,000 is where the model consistently predicts well and captures a stable pattern which indicates smaller variation. The random scatter at lower fitted values could indicate greater variation in those years due to the external sudden shock like the COVID-19 pandemic in 2020 and the post-pandemic recovery years. At the lower values, we observe heteroscedasticity or non-constant variance because the scatterplot is wider and more random at those values.

plot(Transit_ts, residuals(ses_model),
     main = "Residuals vs Actual Values",
     xlab = "Actual Values",
     ylab = "Residuals",
     col = "navy", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# This plot is similar to the fitted vs. residuals plot meaning that the model fits the data well. It indicates higher variation at lower values and smaller variation at higher values during the stable years. The clustering around the higher values indicates there is a model misfit in this case due to heteroscedasticity.

Acf(ses_residuals)

# The bars outside of the blue dashed line which is the approximate 95% confidence interval, indicate significantly autocorrelation at that lag. There are significant spikes at lag 12 and lag 24 which suggests that there are patterns that are left unexplained by the naive model. The spikes at some lags indicate seasonality or external shocks which might be better captured by other models.

accuracy(ses_model)
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -812.0035 55127.84 40313.36 -0.8363806 6.026904 0.7092143
##                    ACF1
## Training set 0.01607982
ses_forecast <- ses(Transit_ts,12)
ses_forecast
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Aug 2023       539578.3 468678.2 610478.5 431145.9 648010.7
## Sep 2023       539578.3 447103.9 632052.8 398150.9 681005.8
## Oct 2023       539578.3 429686.5 649470.1 371513.3 707643.4
## Nov 2023       539578.3 414674.8 664481.9 348554.8 730601.9
## Dec 2023       539578.3 401283.0 677873.6 328073.9 751082.8
## Jan 2024       539578.3 389078.2 690078.4 309408.3 769748.4
## Feb 2024       539578.3 377791.5 701365.1 292146.7 787009.9
## Mar 2024       539578.3 367242.4 711914.2 276013.3 803143.4
## Apr 2024       539578.3 357302.8 721853.8 260812.0 818344.6
## May 2024       539578.3 347877.9 731278.7 246397.9 832758.8
## Jun 2024       539578.3 338895.2 740261.5 232659.9 846496.8
## Jul 2024       539578.3 330297.6 748859.1 219511.1 859645.6
plot(ses_forecast)

# The MAPE for this type of forecast is 6.026904 which is less than 10%, so this is a good indicator that the simple smoothing forecast method is overall performing well especially in the pre-COVID-19 time, and the data does not have extremely small values that impact the percentages except for the years of 2020-2021.

# In 1 year, the time series value for the usage of public transportation services will be 539578.

# The forecast is taking the last value and predicting values close to that for the upcoming whole year.

# Holt-Winters
HW_model <- HoltWinters(Transit_ts)
HW_model
## Holt-Winters exponential smoothing with trend and additive seasonal component.
## 
## Call:
## HoltWinters(x = Transit_ts)
## 
## Smoothing parameters:
##  alpha: 0.9284877
##  beta : 0
##  gamma: 1
## 
## Coefficients:
##           [,1]
## a   551045.439
## b     1531.450
## s1   -9283.195
## s2    7191.342
## s3   51986.625
## s4  -14146.686
## s5  -39238.939
## s6  -36522.821
## s7  -43226.301
## s8   23642.699
## s9   -2666.430
## s10  25685.841
## s11   4968.181
## s12 -18839.439
plot(HW_model)

# Alpha = 0.9284877, therefore the level updates very quickly and the dataset is highly sensitive to recent changes in values. With the consideration of the post-2020 recovery efforts, an alpha of this scale is helpful to detect trends and turning points faster while giving less weight to the pre-COVID era in public transportation usage. 

# Beta = 0; The beta component being equal to 0 means that there is no meaningful trend. The trend does not change over time and stays constant except for the noticeable trend drop in 2020.

# Gamma = 1; the model completely relies on the most recent seasonal observation and it updates fully to match the last season's deviation. In this case, it means that the past seasonal values like the pre-2020 seasonal values hold no influence on the incoming data. 

# a = 551045.439 is the initial level of the series; it is the baseline around which the trend and seasonality of the series oscillates. 

# b = 1531.450 is the initial trend. This is a positive trend which states that per period there is an increase of 1531.450 units on the top of the a value above. 

# s1 - s12 values are the seasonal indices for months 1 to 12. These represent the additive seasonal adjustments relative to the baseline. For certain months, the value is positive indicating an above average value and for others, the value is negative indicating a below average value.

checkresiduals(HW_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from HoltWinters
## Q* = 62.11, df = 24, p-value = 3.198e-05
## 
## Model df: 0.   Total lags used: 24
HW_residuals <- residuals(HW_model)
HW_residuals
##               Jan          Feb          Mar          Apr          May
## 2001   12443.7337    2031.1375   -1849.7824   -2238.2324    -682.6780
## 2002   -7837.7936    2497.3880  -22978.9064   43842.5551  -30917.5780
## 2003  -18989.5444   -7675.1911    2088.4968   26796.6280  -51026.3047
## 2004  -53984.8072   44993.1036   10145.7022   -4879.1173  -59211.2151
## 2005  -42161.5620   23830.4765    9564.3313   10093.7664  -42327.0593
## 2006    2811.7704   -3336.6186   29679.7552  -30031.5953   32329.2171
## 2007    4492.0584  -13252.7723   52303.5532   -7272.6228   75335.2001
## 2008   12318.8458   13980.9119  -34580.9855   59783.4945  -30842.9375
## 2009  -38251.4800    5401.6556    5812.2635   13747.8898  -49008.1467
## 2010  -52030.5608  -20348.4520   60153.4517   16974.4564  -44137.5817
## 2011  -40608.5445    7630.7030   36065.4312   -8715.3049    -915.4546
## 2012  -13905.1547   35942.6252  -37433.3509   -3703.9972    5562.1304
## 2013   22029.7338  -32737.3171  -15703.5269   58339.8893  -20273.6128
## 2014  -24667.2299    -797.2070   10962.4577   43604.8221  -13515.5216
## 2015  -60413.9964  -23336.5383   35215.6590   29534.8941  -43554.8891
## 2016  -63295.0627   48317.9373  -16629.3675   -4249.4166   -1849.0659
## 2017  -10134.1860  -17212.1061   16856.9596  -17992.0519   29274.5127
## 2018    7277.1760  -13229.8222  -18609.6142   21506.7848   16496.0249
## 2019    -124.4162  -22585.6250   -2242.9477   52376.3878   -1370.0893
## 2020   12714.2994  -13543.2476 -377735.0577 -333149.8536   -4377.4099
## 2021  -27373.9872   -5146.3151    4407.6052   29240.6430   -9431.3070
## 2022  -66346.2381   42280.1402   19837.9557   18766.2356  -18692.8370
## 2023    1271.3492   -8355.4911   13161.7786   -9106.5517   20645.6999
##               Jun          Jul          Aug          Sep          Oct
## 2001   -1364.5202   23607.0531  -14899.4233  -59158.2334   50508.4274
## 2002  -26332.9261   53131.7323  -36201.3616    7318.4644   16555.8156
## 2003   -7550.8054   51096.1869  -50373.5167   41154.7804  -30132.0569
## 2004   37865.8420    6909.0551   -6089.1105   17890.2630  -19402.8723
## 2005   17658.0664   -3356.2588   29665.3219   28027.3241  -47099.0318
## 2006  -12937.7665    -358.4541   23447.2538  -13044.2071   12239.3057
## 2007  -80605.1672   14813.9221   20673.8648  -25699.9516   -3503.8163
## 2008   11139.4423   57689.1518  -44625.0884   16503.6691   -3216.0358
## 2009   27245.1523   23293.0326  -37487.1101   24200.6665   -1400.4055
## 2010   19837.4117    5172.9133    9190.6045   -7156.7369  -20436.0538
## 2011    9585.3274   -7635.5450     688.3286   21979.2814  -13464.8373
## 2012  -20213.3806   25329.9856   28653.5091  -42021.4295    4197.0163
## 2013  -28407.6879   41123.0831    4783.2366  -10129.3220   27776.5075
## 2014  -19740.7154   31450.5754  -11980.7218   21171.2802   10620.1527
## 2015   34745.2792   31829.1871  -59552.7762   28081.5205   12140.8546
## 2016   12862.3349  -39641.1754   46674.1473   -7514.8787  -40050.7713
## 2017   -6648.9932  -30768.8299   32854.0251  -17319.0072   22694.8285
## 2018   -7662.8407   -2396.4684   19208.1826  -38425.8653   57395.9496
## 2019  -32725.8318   42390.6103   -1148.9896   -8694.1117   16786.7000
## 2020   86105.4279   74268.7557    2178.3038   -5791.6007  -33680.9253
## 2021   59551.3801   36121.2829   -2034.3571   17417.0883  -13544.7909
## 2022   23489.9594   -5263.0116   29938.7545   13529.5409  -29112.6443
## 2023  -18481.4400  -19765.2913                                       
##               Nov          Dec
## 2001  -10857.4785   -3214.5598
## 2002  -45907.0919   23139.4007
## 2003  -62215.9835   85422.4444
## 2004   -1336.3212   15573.1290
## 2005    9841.0839  -19223.7825
## 2006  -11975.4138   -2106.4353
## 2007  -31891.5907  -18961.4390
## 2008  -86449.9370   35345.3060
## 2009  -35130.8528   18073.3966
## 2010    2591.0055    7246.2158
## 2011   -4503.1841    7975.9897
## 2012   -9119.0131   -8103.5126
## 2013  -54579.5311    1851.8823
## 2014  -84752.9544   43156.5748
## 2015  -37675.8592   17264.0648
## 2016   17758.3055  -17374.5910
## 2017  -11857.6726  -42613.0637
## 2018  -36746.1896  -33353.5102
## 2019  -25376.9317   -6365.0871
## 2020   35228.2310   17786.3451
## 2021   40389.3620    2295.7379
## 2022   32837.1164   -5607.1791
## 2023
sigma <- sd(HW_residuals, na.rm = TRUE)
sigma
## [1] 43052.6
# Sigma = 43052.6; Based on the alpha value, the model deviates by about 43,052 units, which means that the relative error is 7.8%. Since the forecasting model relative error is under 10%, the model is reasonably accurate and most forecasts are within +/- 7.8% of the actual values. When it comes down to budget planning for the public transportation services, this level of standard deviation is acceptable and the city officials does not need a funding or service improvements based on this sigma.

plot(HW_residuals,
     main = "Residuals from Holt-Winters Forecast for the Usage of the Public Transportation Services",
     ylab = "Residuals", xlab = "Time",
     col = "blue", type = "l")
abline(h = 0, col = "red")

# The plot indicates that the model captured the main pattern of the public transportation usage, because the pattern oscillates around 800,000 transit users except for the sharp drop around 2020 which is a huge negative deviation from the norm. 

hist(HW_residuals,
     main = "Histogram of Residuals",
     xlab = "Residuals",
     col = "lightblue")

# The histogram roughly forms a bell-shaped curve centered around 0, however due to the sudden drop around 2020 and the gradual recovery is still below the pre-2020 levels explaining the lower outliers in the boxplot earlier, resulting in the negatively skewed AKA left skewed distribution of the histogram.

plot(fitted(HW_model), residuals(HW_model),
     main = "Residuals vs Fitted Values",
     xlab = "Fitted Values",
     ylab = "Residuals",
     col = "steelblue", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# The xhat plot follows the shape of the actual data very closely, therefore it can be interpreted that the model captures the overall pattern well. The level graph follows the original time series pattern exactly including the 2020 sharp decline. The trend line is flat meaning the model does not have a strong upward or downward growth, therefore it stays constant. The seasonality oscillates between -40K - 40K at cyclical regular intervals suggesting a stable and strong seasonal effect over time.

plot(Transit_ts, residuals(HW_model),
     main = "Residuals vs Actual Values",
     xlab = "Actual Values",
     ylab = "Residuals",
     col = "navy", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# This plot is similar to the fitted vs. residuals plot meaning that the model fits the data well. It indicates higher variation at lower values and smaller variation at higher values during the stable years. The clustering around the higher values indicates there is a model misfit in this case due to heteroscedasticity.

Acf(HW_residuals)

# ACF plot has most of the spikes between the blue dashed line which indicates there are no significant patterns left unexplained by the model. The model mostly captures the white noise, indicating that it is a good fit for the dataset.

HW_forecast <- forecast(HW_model,12)
HW_forecast
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Aug 2023       543293.7 488119.6 598467.8 458912.1 627675.2
## Sep 2023       561299.7 486009.9 636589.4 446154.0 676445.4
## Oct 2023       607626.4 516561.0 698691.8 468353.9 746898.9
## Nov 2023       543024.6 438538.8 647510.3 383227.4 702821.7
## Dec 2023       519463.8 403095.3 635832.2 341493.5 697434.0
## Jan 2024       523711.3 396565.8 650856.8 329259.0 718163.6
## Feb 2024       518539.3 381461.4 655617.2 308896.8 728181.8
## Mar 2024       586939.7 440602.1 733277.4 363135.6 810743.8
## Apr 2024       562162.1 407116.7 717207.4 325040.6 799283.5
## May 2024       592045.8 428756.4 755335.2 342316.2 841775.3
## Jun 2024       572859.6 401722.8 743996.3 311128.5 834590.6
## Jul 2024       550583.4 371943.7 729223.1 277377.6 823789.2
accuracy(HW_forecast)
##                     ME     RMSE      MAE        MPE    MAPE      MASE
## Training set -2603.178 43051.87 26226.83 -0.8402477 4.26629 0.4613965
##                    ACF1
## Training set 0.06361874
plot(HW_forecast)

# The MAPE for this type of forecast is 4.26629 which is less than 10% so this is a good indicator that the Holt-Winters forecast method is overall performing well especially in the pre-COVID-19 time, and the data does not have extremely small values that impact the percentages except for the years of 2020-2021. Considering the trend and seasonal component, this model can be used to model our dataset.

# In 1 year, the time series value for the usage of the public transportation services will be 550583.4.

# The forecast is mimicking the original plot, so it is adding some noise back into the forecast, therefore the forecast takes the trend of the overall plot of the actual dataset as it looks to forecast the upcoming year.

# Accuracy Summary
Naive <- accuracy(naive_model)
MA3 <- accuracy(MA3_f)
MA6 <- accuracy(MA6_f)
MA9 <- accuracy(MA9_f)
SES <- accuracy(ses_model)
HoltWinters <- accuracy(HW_forecast)

models <- list(
  Naive = Naive,
  MA3 = MA3,
  MA6 = MA6,
  MA9 = MA9,
  SES = SES,
  HoltWinters = HoltWinters
)

accuracy_table <- do.call(rbind, lapply(names(models), function(name) {
  acc <- models[[name]][1, c("RMSE", "MAE", "MAPE")]
  df <- data.frame(
    Model = name,
    RMSE = acc["RMSE"],
    MAE  = acc["MAE"],
    MAPE = acc["MAPE"],
    stringsAsFactors = FALSE
  )
  rownames(df) <- NULL
  return(df)
}))

rownames(accuracy_table) <- NULL
accuracy_table
# I choose MAPE as my accuracy measure because it is scale-independent due to this reason we can focus on relative errors based on the initial levels for the data set making it the best measure to be considered. Based on the summary table, the Moving Average Method of the order 9 has the lowest MAPE making it the best forecast for the short-term forecast of the data otherwise Holt-Winters would be the best in order to forecast long-term trend and seasonality and the Naive Method has the largest MAPE making it the worst forecast for this data.

# Conclusion

# The time series has a steady trend and seasonality except for the sharp drop in 2020-2021 due to the COVID-19 pandemic and a reduction in mobility. For short-term future planning, the Moving Average forecast of the order 9 should be used in order to conduct resource allocation which can prevent wasted resources in low-demand/high-demand months appropriately and ensures capacity planning for the upcoming year. For long-term strategic planning, the Holt-Winters model should be used to understand the potential growth/decline over months/years taking trend and seasonality in consideration. This can be used to make any type of infrastructure improvements projects or route expansions.

# As the public transportation service is recovering from the post-pandemic effect, over the next year I anticipate for the value of the time series to go up. In 2 years, I also anticipate for the value to continue going up as more companies are implementing back to office mandates and school/colleges are back to normal schedules.